-
Notifications
You must be signed in to change notification settings - Fork 3
#674 Allow specifying to ignore schema changes for selected operations #683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds an operation-level Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as TaskRunnerBase
participant OpDef as OperationDef
participant DF as DataFrame
participant Meta as MetaTable
participant Journal as Journal/SchemaRegistry
Note over Runner,OpDef: Run task with operation context
Runner->>OpDef: read ignoreSchemaChange
Runner->>DF: produce DataFrame from job
Runner->>Meta: fetch registered schema
alt ignoreSchemaChange == true or raw format
Note right of Runner: Skip schema comparison\nreturn (false, [])
Runner->>Journal: do not register schema change
else normal path
Runner->>DF: extract schema
Runner->>Meta: compare schemas
alt differences found
Runner->>Journal: register new schema & record differences
Journal-->>Runner: newSchemaRegistered = true
else no differences
Runner-->>Runner: newSchemaRegistered = false
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-06-18T08:27:21.504ZApplied to files:
🧬 Code graph analysis (1)pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
🔇 Additional comments (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala:
- Around line 573-577: Fix the two comment typos in handleSchemaChange: change
"Raw tables do need schema check" to "Raw tables do NOT need schema check" and
"return non changes" to "return no changes". Then decide and implement the
intended behavior for operationDef.ignoreSchemaChange: either keep the current
early return (which skips both schema saving and notifications) and update the
comment to state that ignoreSchemaChange suppresses both tracking and
notifications, or change the logic so only notifications are suppressed while
schema is still saved to the bookkeeper (i.e., remove
operationDef.ignoreSchemaChange from the early return condition and instead
branch later to skip notification logic). Ensure references to
table.format.isRaw and operationDef.ignoreSchemaChange in handleSchemaChange are
updated accordingly.
In
@pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala:
- Around line 613-640: The test has a typo in the destructured variable name:
change (hasSchamaChanged, changes) returned by runner.handleSchemaChange(...) to
(hasSchemaChanged, changes) and update the subsequent assertion
assert(!hasSchamaChanged) to assert(!hasSchemaChanged); ensure any other
references in this test to hasSchamaChanged are similarly renamed so the
variable name matches the intended hasSchemaChanged symbol.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
README.mdpramen/core/src/main/scala/za/co/absa/pramen/core/pipeline/OperationDef.scalapramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scalapramen/core/src/test/scala/za/co/absa/pramen/core/OperationDefFactory.scalapramen/core/src/test/scala/za/co/absa/pramen/core/pipeline/OperationDefSuite.scalapramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-18T08:27:21.504Z
Learnt from: yruslan
Repo: AbsaOSS/pramen PR: 611
File: pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala:29-29
Timestamp: 2025-06-18T08:27:21.504Z
Learning: In pramen/core/src/main/scala/za/co/absa/pramen/core/metastore/model/MetastoreDependencyFactory.scala, the constant variable name was changed from DATE_UNTIL_EXPR_KEY to DATE_TO_EXPR_KEY, but both constants hold the same string value "date.to". This is a variable name refactoring, not a configuration key change, so it doesn't affect backward compatibility.
Applied to files:
pramen/core/src/main/scala/za/co/absa/pramen/core/pipeline/OperationDef.scalapramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala
🧬 Code graph analysis (1)
pramen/core/src/main/scala/za/co/absa/pramen/core/pipeline/OperationDef.scala (1)
pramen/core/src/main/scala/za/co/absa/pramen/core/utils/ConfigUtils.scala (2)
ConfigUtils(33-612)getOptionBoolean(36-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Test Coverage on Scala 2.12.18
- GitHub Check: Test Spark 3.4.4 on Scala 2.12.20
- GitHub Check: Test Spark 2.4.8 on Scala 2.11.12
- GitHub Check: Test Spark 3.4.4 on Scala 2.13.16
- GitHub Check: Test Spark 3.5.5 on Scala 2.12.20
- GitHub Check: Test Spark 3.3.4 on Scala 2.12.20
- GitHub Check: Test Spark 3.3.4 on Scala 2.13.16
- GitHub Check: Test Spark 3.5.5 on Scala 2.13.16
🔇 Additional comments (8)
pramen/core/src/main/scala/za/co/absa/pramen/core/pipeline/OperationDef.scala (1)
41-41: LGTM! Clean implementation of the new configuration option.The
ignoreSchemaChangefield is properly integrated into theOperationDefmodel with:
- Appropriate placement in the case class constructor
- Config key following naming conventions
- Default value of
falseensuring backward compatibility- Consistent with other boolean flags like
allowParallelandalwaysAttemptAlso applies to: 67-67, 105-105, 152-152
README.md (1)
1774-1776: Documentation is clear and helpful.The new
ignore.schema.changeconfiguration option is well-documented with:
- Clear description of its purpose
- Correct default value shown
- Appropriate placement in the operations configuration section
pramen/core/src/test/scala/za/co/absa/pramen/core/OperationDefFactory.scala (1)
33-33: Test factory correctly updated.The
ignoreSchemaChangeparameter is properly added to the test factory with:
- Correct default value (
false) matching the production code- Appropriate parameter placement
- Proper forwarding to the
OperationDefconstructorAlso applies to: 54-54
pramen/core/src/test/scala/za/co/absa/pramen/core/pipeline/OperationDefSuite.scala (1)
112-112: Excellent test coverage for the new field.The tests thoroughly validate the
ignoreSchemaChangefunctionality:
- Default value (
false) verified across ingestion, transformation, and sink operations- Configured value (
true) properly tested- Multiple operation types and scenarios covered
Also applies to: 131-131, 167-167, 232-232, 272-272
pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala (1)
362-362: Call sites correctly updated.Both invocations of
handleSchemaChangeare properly updated to pass thetask.job.operationas the newoperationDefparameter, maintaining consistency across the codebase.Also applies to: 396-396
pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala (3)
63-445: LGTM! Destructuring updates are correct.The mechanical updates to accommodate the new
OperationDefreturn value fromgetUseCaseare applied consistently across all test cases. The destructuring patterns correctly ignore the new element where it's not needed.
542-611: LGTM! Schema change test updates are correct.All existing
handleSchemaChangetest cases are properly updated to pass the newoperationparameter. The tests maintain their original intent and correctly adapt to the new method signature.
643-687: LGTM! Test helper method correctly updated.The
getUseCasehelper method signature is properly updated to returnOperationDefas the fifth element. TheoperationDefis created with appropriate test configuration and correctly positioned in the return tuple. All call sites have been consistently updated to handle the new return value.
pramen/core/src/main/scala/za/co/absa/pramen/core/runner/task/TaskRunnerBase.scala
Show resolved
Hide resolved
pramen/core/src/test/scala/za/co/absa/pramen/core/tests/runner/task/TaskRunnerBaseSuite.scala
Show resolved
Hide resolved
Unit Test Coverage
Files
|
9d9bc83 to
0be24ea
Compare
Closes #674
Summary by CodeRabbit
New Features
ignore.schema.changeto opt out of schema-change validation for specific operations.Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.